API: DataFrame.to_csv formatting parameters for float indexes #11681

nbonnotte · 2015-11-23T17:38:21Z

Fix issue #11553

Two things:

I've created a Float64Index._format_native_types method which is a copy-paste of FloatBlock.to_native_types. I would have preferred to call the latter directly, but I'm not sure what the placement parameter of the FloatBlock constructor means. I guess I doesn't really matters, since I could put whatever value and it should work (I think), and my hesitation a bit unfounded, but I don't know if it would be really a clean solution. Maybe someone can think of a more elegant way?
Since a Float64Index containing only NaNs collapses when part of a multi-index, its NaNs values would not be converted using na_rep, so I had to hack a solution. I put a comment in the relevant part. I'm not quite convinced myself of the elegance of the solution, though.

What do you think?

nbonnotte · 2015-11-24T10:58:46Z

And I've corrected the fact that the decimal option was not taken into account for 0.0, by replacing the use of %g formatting with a call to str

jreback · 2015-11-24T13:14:05Z

pandas/core/index.py

+        # if any index contains only NaNs, it has collapsed into an empty
+        # Float64Index, and when the multiindex has been recomposed
+        # the NaNs have come back as NaNs, not as strings corresponding to
+        # na_rep


this is non-performant. What exactly is the issue here?

Hu, I realize my comment is not only not-clear, but also non-accurate

Let's consider the example in my tests:

df = DataFrame({'a': [0, np.NaN], 'b': [0, 1], 'c': [2, 3]}).set_index(['a', 'b'])

The index is a multi-index, with the following levels:

Float64Index([0.0], dtype='float64', name=u'a') Int64Index([0, 1], dtype='int64', name=u'b')

After the calls to _format_native_types, the variable levels contains:

['0.0'] ['0', '1']

and afterwards mi.values contain:

[('0.0', '0'), (nan, '1')]

Here, nan is not a string "nan", but numpy.NaN (which is printed as the string nan)

Then, for that reason, in the tests I've introduced, the following test fails:

Traceback (most recent call last): File "/home/nicolas/Git/pandas/pandas/tests/test_format.py", line 2965, in test_to_csv_na_rep self.assertEqual(df.set_index(['a', 'b']).to_csv(na_rep='_'), expected) AssertionError: 'a,b,c\n0.0,0,2\nnan,1,3\n' != 'a,b,c\n0.0,0,2\n_,1,3\n'

So the way to fix this is set both the levels & labels; e.g. you append a new value to the level (which will represent the nans), then change the -1 in the labels to that value (the position of that value, e.g. 1 in this case). This will then reformat the MultiIndex to work correctly.

You can do this in the MultiIndex constructor of _format_native_types

In [20]: df.index.set_levels([0.0,'_'],level=0).set_labels([0,1],level=0).values Out[20]: array([(0.0, 0), ('_', 1)], dtype=object)

nbonnotte · 2015-11-28T16:22:32Z

@jreback all set

jreback · 2015-11-29T17:55:21Z

pandas/core/index.py

@@ -3878,6 +3878,32 @@ def _convert_slice_indexer(self, key, kind=None):
        # translate to locations
        return self.slice_indexer(key.start, key.stop, key.step)

+    def _format_native_types(self, na_rep='', float_format=None,
+                             decimal='.', quoting=None, **kwargs):


since this is basically identical to core/internals/FloatBlock/to_native_types. let's pull both those out and put it in a function in core/format.py/FloatArrayFormatter and call it .get_formatted_data(). See how that works out. These are the routines for screen printing (which are necessarily different from to_csv / index formatting).

jreback · 2015-11-29T17:55:56Z

ok, looks pretty good. I think we can take this opportunity to re-factor a bit as I have noted above.

jreback · 2015-12-06T19:15:45Z

@nbonnotte if you can update / refactor as above would be great

nbonnotte · 2015-12-07T08:12:32Z

I will, just had not the time yet. If you'd like this to be done quickly, because of the schedule for the 0.18.0 release, let me know.

jreback · 2015-12-07T11:24:42Z

no, just pinging :)

jreback · 2015-12-16T14:00:11Z

lmk when you can update

nbonnotte · 2015-12-26T19:20:50Z

@jreback I just pushed the changes. Let me know if other changes are needed.

Happy Holidays!

jreback · 2015-12-26T21:26:28Z

pandas/tests/test_format.py

+        expected = "a,b,c\n_,0,2\n_,1,3\n"
+        self.assertEqual(df.set_index('a').to_csv(na_rep='_'), expected)
+        self.assertEqual(df.set_index(['a', 'b']).to_csv(na_rep='_'), expected)
+        # check if na_rep parameter does not break anything when no NaN


blank line here

jreback · 2015-12-26T21:29:54Z

lgtm. some very minor formatting changes. in general like to have blank lines between different sub-tests and to format code nicely.

ping when pushed and green.

jreback · 2015-12-26T21:30:44Z

pandas/core/format.py

@@ -2101,6 +2105,32 @@ def _format_strings(self):

        return fmt_values

+    def get_formatted_data(self):
+        values = self.values


add a doc-string here describing what this is doing. Some comments in the code as well (to describe the purposes of the if clauses)

Fix issue #11553

nbonnotte · 2015-12-27T08:47:01Z

If you and Travis agree, that should be it.

I have added some comments as you suggested, to make the code clearer. But to be honest I'm not quite sure about how some bits of the code fit with the rest, for instance how and where the quoting parameter is handled. I mostly moved stuff around, and it's difficult to explain things in comments when one does not understand what is going on. I hope the comments I wrote are OK, though.

Also, I suppose FloatArrayFormatter could be refactorized a bit, because right now it looks like more a potpourri, a mixture of different bits of code that are related but do not have much code in common. But I'm not sure it is worth the effort.

jreback · 2015-12-27T16:22:30Z

pandas/tests/test_format.py

+
+        # same but for an index
+        self.assertEqual(
+            df.set_index('a').to_csv(decimal='^'), expected)


ok, are there tests for quoting? if not can you add a couple. thxs.

jreback · 2015-12-27T16:24:13Z

@nbonnotte code looks great. just see if we have some tests for quoting if not, pls add them.

the explanations are fine. Just want to have a note instead of a mass of code to explain a bit what is happening.

as far as refactoring, you got what I was looking for (integration between Index and data formatting). if you see additional thinks, don't hesitate with another PR!

ping when pushed / green (or if tests ok, lmk)

nbonnotte · 2015-12-27T17:18:53Z

Yeah, the quoting parameter is used in pandas/tests/test_format.py, at least in test_to_csv_quotechar(), test_to_csv_doublequote(), and test_to_csv_escapechar().

API: DataFrame.to_csv formatting parameters for float indexes

jreback · 2015-12-27T17:24:41Z

thanks!

jreback added Output-Formatting __repr__ of pandas objects, to_string API Design Indexing Related to indexing on series/frames, not to indexes themselves labels Nov 24, 2015

jreback reviewed Nov 24, 2015
View reviewed changes

jreback reviewed Nov 29, 2015
View reviewed changes

jreback added this to the 0.18.0 milestone Nov 29, 2015

jreback reviewed Dec 26, 2015
View reviewed changes

API: DataFrame.to_csv formatting parameters for float indexes

9302811

Fix issue #11553

jreback reviewed Dec 27, 2015
View reviewed changes

jreback added a commit that referenced this pull request Dec 27, 2015

Merge pull request #11681 from nbonnotte/to_csv-formatting-11553

f295c0a

API: DataFrame.to_csv formatting parameters for float indexes

jreback merged commit f295c0a into pandas-dev:master Dec 27, 2015

nbonnotte deleted the to_csv-formatting-11553 branch December 27, 2015 17:25

nbonnotte mentioned this pull request Dec 28, 2015

DataFrame.to_csv ignores some formatting parameters for float indexes #11553

Closed

nbonnotte mentioned this pull request Jan 28, 2016

Cleaning FloatArrayFormatter #12164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: DataFrame.to_csv formatting parameters for float indexes #11681

API: DataFrame.to_csv formatting parameters for float indexes #11681

nbonnotte commented Nov 23, 2015

nbonnotte commented Nov 24, 2015

jreback Nov 24, 2015

nbonnotte Nov 24, 2015

jreback Nov 24, 2015

nbonnotte commented Nov 28, 2015

jreback Nov 29, 2015

jreback commented Nov 29, 2015

jreback commented Dec 6, 2015

nbonnotte commented Dec 7, 2015

jreback commented Dec 7, 2015

jreback commented Dec 16, 2015

nbonnotte commented Dec 26, 2015

jreback Dec 26, 2015

jreback commented Dec 26, 2015

jreback Dec 26, 2015

nbonnotte commented Dec 27, 2015

jreback Dec 27, 2015

jreback commented Dec 27, 2015

nbonnotte commented Dec 27, 2015

jreback commented Dec 27, 2015

API: DataFrame.to_csv formatting parameters for float indexes #11681

API: DataFrame.to_csv formatting parameters for float indexes #11681

Conversation

nbonnotte commented Nov 23, 2015

nbonnotte commented Nov 24, 2015

jreback Nov 24, 2015

Choose a reason for hiding this comment

nbonnotte Nov 24, 2015

Choose a reason for hiding this comment

jreback Nov 24, 2015

Choose a reason for hiding this comment

nbonnotte commented Nov 28, 2015

jreback Nov 29, 2015

Choose a reason for hiding this comment

jreback commented Nov 29, 2015

jreback commented Dec 6, 2015

nbonnotte commented Dec 7, 2015

jreback commented Dec 7, 2015

jreback commented Dec 16, 2015

nbonnotte commented Dec 26, 2015

jreback Dec 26, 2015

Choose a reason for hiding this comment

jreback commented Dec 26, 2015

jreback Dec 26, 2015

Choose a reason for hiding this comment

nbonnotte commented Dec 27, 2015

jreback Dec 27, 2015

Choose a reason for hiding this comment

jreback commented Dec 27, 2015

nbonnotte commented Dec 27, 2015

jreback commented Dec 27, 2015